Product Announcements

vSphere 5.0 Storage Features Part 2 – Storage vMotion

Storage vMotion allows the migration of running VMs from one datastore to another without incurring any downtime. It was first introduced as a mechanism to assist customers to move VMs from VMFS-2 to VMFS-3 (without incurring VM downtime) when moving from ESX 2.x to ESX 3.0.1. In fact, we didn't even call it Storage vMotion back then, but referred to it as something called Upgrade vMotion (?).

It quickly became apparent that many other uses could be made of Storage vMotion, such as assisting with storage tiering, and movement of VMs when a storage device needed maintenance or was being decommissioned/replaced.

The underlying technology has undergone numerous changes since that initial release, and once again in vSphere 5.0 we are introducing a new improved migration mechanism which improves on the performance & reliability of Storage vMotion operations.

In ESX 3.5, we relied on the traditional VM snapshot mechanism. We created a VM snapshot against the source disk which handled the I/Os for the VM from that point onwards. This meant that the base disk was quiesced so we could start to move it over to the destination datastore. Once the VM's base disk was migrated, we then committed all the changes captured in the snapshot against the disk on the destination datastore. However if the VM was very busy, the snapshot delta could grow to a large size, & the commit process could take a significant amount of time to complete.

In the 4.x releases, we improved on this mechanism using a feature called Changed Block Tracking (CBT). This meant that we no longer required the use of snapshots in Storage vMotion, which could grow extremely large for I/O intensive VMs, and thus take a long time to commit to the destination. CBT keeps track of which disk blocks changed after the initial copy. We then recursively went through one or more copy passes until the number of changed blocks was small enough to allow us to switch the running VM to the destination datastore using the Fast Suspend/Resume operation. This mechanism is very similar to how we do our vMotion operations over the network. Again, if we had a very busy VM, the migration could take a long time as we had to go through many copy passes.

In 5.0, we improve on Storage vMotion once again by doing a Storage vMotion operation in a single pass rather than multiple iterative copy passes. Storage vMotion in vSphere 5.0 uses a new Mirror Driver mechanism to keep blocks on the destination synchronized with any changes made to the source after the initial copy. The migrate process does a single pass of the disk, copying all the blocks to the destination disk. If any blocks change after it has been copied, it is synchronized from the source to the destination via the mirror driver. There is no longer any need for recursive passes. This means that we now have a much shorter Storage vMotion operation as it can complete a migration in a single pass.

Some additional storage vMotion features in vSphere 5.0 are:

  • Storage vMotion will work with Virtual Machines that have snapshots/linked clones.
  • Storage vMotion allows VMware to implement a new balancing technique for VMs based on storage usage and load. This feature is called Storage DRS and will be covered in a future blog posting.

One point which sometimes raises questions is why we observe references to memory pre-copy in the logs during a Storage vMotion. Just to clarify, there is no memory pre-copy. The log messages are just an artifact from the migrate infrastructure in the vmkernel (the same migrate infrastructure that vMotion uses). The memory is atomically transferred over to the destination VM during the final fast suspend/resume (stun) operation which effectively switches the VM over to the disks on the destination.

A final note on Storage vMotion and network communication which sometimes causes confusions. Storage vMotion operations are done internally on a single ESXi server (or offloaded in-band to the storage array if the arrays supports hardware acceleration via VAAI). There is no communication done over the 'network' via the ESXi Service Console or management network for a Storage vMotion operation. Some control operations may take place between vpxa/hostd/vpxd/nfc (network file copy), but all bulk data transfer is internal to an ESXi using the VMkernel Data Mover (or offloaded using VAAI as mentioned above). In earlier versions of ESX, we did use nfc (without the Data Mover) to copy some Virtual Machine files (such as logs) over the loopback adapter. But we no longer do this.